Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: rm gossip usage from node/store/cluster ID allocation #56263

Merged
merged 1 commit into from
Nov 9, 2020

Conversation

irfansharif
Copy link
Contributor

In 20.2, with #52526, we introduced the join RPC for new nodes in the
system to be handed a pre-allocated node ID, a pre-allocated store ID,
and be informed by the existing cluster what the cluster ID was.
In 20.1 and earlier this functionality was provided by our use of
gossip, and by constructing hollowed out servers (without node/store
IDs) and allocating an ID at the joining node. This required an
awkward dance around needing KV to be up in order to allocate the right
IDs, but not having an ID for the server process itself.

We retained the deprecated gossip codepaths in 20.2 in order to maintain
compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we
can always assume that we're talking to nodes that are at least 20.2,
and therefore nodes that are able to use the join RPC correctly. This
lets us strip out our usage of gossip for ID allocation, which in turn
paves the way for a few other simplifications.

We also delete the mixed-version/join-init roachtest, as it was only
ever relevant for the 20.1/20.2 cycle. The current structure around
joining nodes being allocated/informed of the right IDs has adequate
coverage with TestClusterConnectivity.

Release note: None

@irfansharif irfansharif requested review from knz and tbg November 3, 2020 21:01
@irfansharif irfansharif requested a review from a team as a code owner November 3, 2020 21:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/server/node.go, line 561 at r1 (raw file):

		// first store ID has already been pre-allocated for us.
		//
		// TODO(irfansharif): Is this sound? If we're restarting an already

Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.

#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/server/init.go, line 181 at r1 (raw file):

// on persisted data instead of this flag.
//
// [1]: This is not technically true for mixed version clusters where we leave

Hmm, isn't this gone now, too?


pkg/server/init.go, line 375 at r1 (raw file):

		if errors.Is(err, ErrIncompatibleBinaryVersion) {
			// Propagate upwards; these is an error condition the caller knows

nit: this is


pkg/server/node.go, line 389 at r1 (raw file):

	// Inform the RPC context of the node ID.
	n.storeCfg.RPCContext.NodeID.Set(ctx, nodeID)

This is done elsewhere already?


pkg/server/node.go, line 561 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.

#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.

I mentioned this in #56272, but I'm looking again and still don't see the problem. storeIDAlloc is the number of storeIDs we still need. We pass that to allocateStoreIDs which gets us that many IDs from the KV layer. startID is the first ID we got back.

Oh, is the problem that the Join RPC also allocated a NodeID/StoreID for us but that we're not properly persisting it? I just briefly looked at the code and was expecting the initServer to actually bootstrap the store with the NodeID/ClusterID/StoreID but now I'm not sure where that happens. Oh, it's in this code via firstStoreID? OK, that is all surprising and now I can see why you're concerned. This does need to be cleaned up. I would generally prefer if the auxiliary and first store were treated separately, just to get the code clean. In particular, it looks like we're bootstrapping the first store asynchronously?!


pkg/server/server.go, line 1396 at r1 (raw file):

	s.rpcContext.ClusterID.Set(ctx, state.clusterID)
	s.rpcContext.NodeID.Set(ctx, state.nodeID)

ah, here it is.

@tbg tbg self-requested a review November 4, 2020 14:42
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/server/node.go, line 561 at r1 (raw file):
Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.

In particular, it looks like we're bootstrapping the first store asynchronously?!

Yup, yikes. Here's what I think the new code structure should be.

- In the init code:
  - If we're being bootstrapped:
    - Only initialize first store, leave remaining stores to start up code
      later
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, bootstrapping additional new stores:
  - Allocate len(auxiliary engines) store IDs, and bootstrap them
    asynchronously. 

This should let us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the first auxiliary store ID should
be.

Cases to consider:
- [x] Bootstrapping first node with single engine
- [x] Bootstrapping first node with multiple engines
- [x] Joining a bootstrapped cluster, with single engine
- [x] Joining a bootstrapped cluster, with multiple engines
- [x] Restarting a bootstrapped node with no new engines
- [x] Restarting a bootstrapped node with new engines

What do you think? Ignoring the gossip code paths here because I want to remove them anyway.

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Nov 4, 2020
...when joining an existing cluster. This diff adds some sanity around
how we bootstrap stores for nodes when we're informed by the existing
cluster what the first store ID should be. Previously we were
bootstrapping the first store asynchronously, and that is not what we
want.

We first observed the implications of doing so in cockroachdb#56263, which
attempted to remove the use of gossip in cluster/node ID distribution.
There we noticed that our somewhat haphazard structure around
initialization of stores could lead to doubly allocating store IDs
(cockroachdb#56272). We were inadvertently safeguarded against this, as described
in cockroachdb#56271, but this structure is still pretty confusing and needed
cleanup.

Now the new store initialization structure is the following:
```
- In the init code:
  - If we're being bootstrapped:
    - Initialize all the stores
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, when initializing additional new stores:
  - Allocate len(auxiliary engines) store IDs, and initialize them
    asynchronously.
```

This lets us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the store ID should be for
the first additional store. We update TestAddNewStoresToExistingNodes to
test allocation behaviour with more than just two stores.

Eventually we could simplify the init code to only initialize the first
store when we're bootstrapping (there's a longstanding TODO from Andrei
to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 5, 2020
56299: server: initialize first store within the init server r=irfansharif a=irfansharif

...when joining an existing cluster. This diff adds some sanity around
how we bootstrap stores for nodes when we're informed by the existing
cluster what the first store ID should be. Previously we were
bootstrapping the first store asynchronously, and that is not what we
want.

We first observed the implications of doing so in #56263, which
attempted to remove the use of gossip in cluster/node ID distribution.
There we noticed that our somewhat haphazard structure around
initialization of stores could lead to doubly allocating store IDs
(#56272). We were inadvertently safeguarded against this, as described
in #56271, but this structure is still pretty confusing and needed
cleanup.

Now the new store initialization structure is the following:
```
- In the init code:
  - If we're being bootstrapped:
    - Initialize all the stores
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, when initializing additional new stores:
  - Allocate len(auxiliary engines) store IDs, and initialize them
    asynchronously.
```

This lets us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the store ID should be for
the first additional store. We update TestAddNewStoresToExistingNodes to
test allocation behaviour with more than just two stores.

Eventually we could simplify the init code to only initialize the first
store when we're bootstrapping (there's a longstanding TODO from Andrei
to that effect), but it's not strictly needed. This PR unblocks #56263.

Release note: None


Co-authored-by: irfan sharif <[email protected]>
rytaft pushed a commit to rytaft/cockroach that referenced this pull request Nov 5, 2020
...when joining an existing cluster. This diff adds some sanity around
how we bootstrap stores for nodes when we're informed by the existing
cluster what the first store ID should be. Previously we were
bootstrapping the first store asynchronously, and that is not what we
want.

We first observed the implications of doing so in cockroachdb#56263, which
attempted to remove the use of gossip in cluster/node ID distribution.
There we noticed that our somewhat haphazard structure around
initialization of stores could lead to doubly allocating store IDs
(cockroachdb#56272). We were inadvertently safeguarded against this, as described
in cockroachdb#56271, but this structure is still pretty confusing and needed
cleanup.

Now the new store initialization structure is the following:
```
- In the init code:
  - If we're being bootstrapped:
    - Initialize all the stores
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, when initializing additional new stores:
  - Allocate len(auxiliary engines) store IDs, and initialize them
    asynchronously.
```

This lets us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the store ID should be for
the first additional store. We update TestAddNewStoresToExistingNodes to
test allocation behaviour with more than just two stores.

Eventually we could simplify the init code to only initialize the first
store when we're bootstrapping (there's a longstanding TODO from Andrei
to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263.

Release note: None
@irfansharif irfansharif force-pushed the 201103.rm-gossip-start branch 2 times, most recently from b533630 to 1d97fb3 Compare November 6, 2020 05:06
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL, rebased atop #56299 and #56302 (first commit here, to be ignored).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/server/init.go, line 181 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm, isn't this gone now, too?

Done.


pkg/server/node.go, line 561 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.

In particular, it looks like we're bootstrapping the first store asynchronously?!

Yup, yikes. Here's what I think the new code structure should be.

- In the init code:
  - If we're being bootstrapped:
    - Only initialize first store, leave remaining stores to start up code
      later
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, bootstrapping additional new stores:
  - Allocate len(auxiliary engines) store IDs, and bootstrap them
    asynchronously. 

This should let us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the first auxiliary store ID should
be.

Cases to consider:
- [x] Bootstrapping first node with single engine
- [x] Bootstrapping first node with multiple engines
- [x] Joining a bootstrapped cluster, with single engine
- [x] Joining a bootstrapped cluster, with multiple engines
- [x] Restarting a bootstrapped node with no new engines
- [x] Restarting a bootstrapped node with new engines

What do you think? Ignoring the gossip code paths here because I want to remove them anyway.

Cleaned up across #56302 and #56299.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so satisfying to see this cleaned up.

Reviewed 2 of 8 files at r2, 6 of 6 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)

@irfansharif irfansharif force-pushed the 201103.rm-gossip-start branch from 1d97fb3 to 967361f Compare November 6, 2020 13:48
@irfansharif
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 6, 2020

Build failed:

In 20.2, with cockroachdb#52526, we introduced the join RPC for new nodes in the
system to be handed a pre-allocated node ID, a pre-allocated store ID,
and be informed by the existing cluster what the cluster ID was.
In 20.1 and earlier this functionality was provided by our use of
gossip, and by constructing hollowed out servers (without node/store
IDs) and allocating an ID at the joining node. This required an
awkward dance around needing KV to be up in order to allocate the right
IDs, but not having an ID for the server process itself.

We retained the deprecated gossip codepaths in 20.2 in order to maintain
compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we
can always assume that we're talking to nodes that are at least 20.2,
and therefore nodes that are able to use the join RPC correctly. This
lets us strip out our usage of gossip for ID allocation, which in turn
paves the way for a few other simplifications.

We also delete the mixed-version/join-init roachtest, as it was only
ever relevant for the 20.1/20.2 cycle. The current structure around
joining nodes being allocated/informed of the right IDs has adequate
coverage with TestClusterConnectivity.

Release note: None
@irfansharif irfansharif force-pushed the 201103.rm-gossip-start branch from 967361f to dab8955 Compare November 9, 2020 16:41
@irfansharif
Copy link
Contributor Author

I'm a bit lost with the CI failures here. Looks like it's timing out for some of our docker acceptance tests? See here. But I tried running them on my gceworker:

$ make acceptance TESTS=TestDockerRuby PKG=./pkg/acceptance TESTFLAGS='-v -show-logs'        
...
--- PASS: TestDockerRuby (10.07s)
    --- PASS: TestDockerRuby/runMode=docker (5.46s)
    --- PASS: TestDockerRuby/runMode=docker#01 (4.61s)
PASS

And they work? I'm a bit lost. @knz, @tbg: any ideas what I'm missing? I took a cursory glance at our acceptance testing harness setup, and nothing stands out to me. And the logs aren't super helpful. I've kicked off another run, but I'm pretty sure it'll fail as well (it has before).

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 9, 2020

Build succeeded:

@craig craig bot merged commit 791cae9 into cockroachdb:master Nov 9, 2020
@irfansharif irfansharif deleted the 201103.rm-gossip-start branch November 9, 2020 19:49
@irfansharif
Copy link
Contributor Author

🤷‍♀️

@knz
Copy link
Contributor

knz commented Nov 10, 2020

very nice piece of work 💯 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants